Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models
نویسندگان
چکیده
Unsupervised word sense disambiguation (WSD) methods are an attractive approach to all-words WSD due to their non-reliance on expensive annotated data. Unsupervised estimates of sense frequency have been shown to be very useful for WSD due to the skewed nature of word sense distributions. This paper presents a fully unsupervised topic modelling-based approach to sense frequency estimation, which is highly portable to different corpora and sense inventories, in being applicable to any part of speech, and not requiring a hierarchical sense inventory, parsing or parallel text. We demonstrate the effectiveness of the method over the tasks of predominant sense learning and sense distribution acquisition, and also the novel tasks of detecting senses which aren’t attested in the corpus, and identifying novel senses in the corpus which aren’t captured in the sense inventory.
منابع مشابه
From the Culinary to the Political Meaning of "quenelle" : Using Topic Models For Identifying Novel Senses (De la quenelle culinaire à la quenelle politique : identification de changements sémantiques à l'aide des Topic Models) [in French]
In this study we explore topic modeling for the automatic detection of new senses of known words. We apply methods developed in previous work for English (Lau et al., 2012, 2014) on a recent case of new word sense induction in French, namely the appearence of the new meaning of gesture for the word « quenelle ». Our experiments illustrate the potential of this approach at learning word senses, ...
متن کاملWord Sense Induction for Novel Sense Detection
We apply topic modelling to automatically induce word senses of a target word, and demonstrate that our word sense induction method can be used to automatically detect words with emergent novel senses, as well as token occurrences of those senses. We start by exploring the utility of standard topic models for word sense induction (WSI), with a pre-determined number of topics (=senses). We next ...
متن کاملKSU KDD: Word Sense Induction by Clustering in Topic Space
We describe our language-independent unsupervised word sense induction system. This system only uses topic features to cluster different word senses in their global context topic space. Using unlabeled data, this system trains a latent Dirichlet allocation (LDA) topic model then uses it to infer the topics distribution of the test instances. By clustering these topics distributions in their top...
متن کاملMultiplicity and word sense: evaluating and learning from multiply labeled word sense annotations
Supervised machine learning methods to model word sense often rely on human labelers to provide a single, ground truth sense label for each word in its context. The finegrained, sense label inventories preferred by lexicographers have been argued to lead to lower annotation reliability in measures of agreement among two or three human labelers (annotators). We hypothesize that annotators can ag...
متن کاملAutomatically Identifying Changes in the Semantic Orientation of Words
The meanings of words are not fixed but in fact undergo change, with new word senses arising and established senses taking on new aspects of meaning or falling out of usage. Two types of semantic change are amelioration and pejoration; in these processes a word sense changes to become more positive or negative, respectively. In this first computational study of amelioration and pejoration we ad...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014